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REMARKS 

In view of the following discussion, the Applicants submit that none of the claims 
now pending in the application is anticipated under the provisions of 35 U.S.C. § 102 or 
made obvious under the provisions of 35 U.S.C, § 103. Thus, the Applicants believe 
that all of these claims are now in allowable form. 

I, REJECTION OF CLAIMS 1. 5-7, 9. 13-16. 20, 22 AND 26-31 UNDER 35 U.S.C. (5 
102 

The Examiner has rejected claims 1 f 5-7, 9, 13-16, 20, 22 and 26-31 under 35 
U.S.C. §1 02(a) as being anticipated by the Thrift et al. patent (US patent 6,188,985, 
issued on February 13, 2001, hereinafter Thrift"). In response, the Applicants have 
amended independent claims 1, 9, 15, 16, 22, 30 and 31, from which claims 5-7, 13-14, 
20 and 26-29 depend, to more clearly recite aspects of the present invention. 

Thrift teaches a voice-activated device for controlling a processor-based host 
system (such as a computer connected to the World Wide Web). In one particular 
embodiment, Thrift teaches a voice-activated remote control that performs at least some 
voice recognition processing on an input user command and then outputs recognized 
speech to the host computer. The host computer then interprets the recognized speech 
for the purpose of executing the user command at the host computer. In some cases, 
the host computer may dynamically generate the grammar used by the remote control 
for speech recognition, based on Web pages and links that are currently displayed on 
the host computer. However, Thrift does not teach that the grammar used by the 
remote control for speech recognition is dynamically updated based on natural 
language processing of the input user command (e.g., speech signal). 

The Examiner's attention is directed to the fact that Thrift fails to disclose or 
suggest the novel invention of updating or adapting a speech recognition process based 
on natural language understanding processing of the speech signal being processed , as 
claimed in Applicants' independent claims 1, 9, 15, 16, 22, 30 and 31. Specifically, 
Applicants' claims 1, 9, 15, 16. 22, 30 and 31, as amended, positively recite: 
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1 . Method for performing speech recognition, said method comprising the 
steps of: 

(a) receiving a speech signal locally from a user via a client device; 

(b) performing speech recognition on said speech signal in accordance 
with an embedded speech recognizer of said dient device to produce a 
recognizable text signal, wherein said embedded speech recognizer employs a 
language model and a natural language understanding module ; 

(c) adapting said performance of speech recognition based on at least one 
local parameter of said speech signal; 

(d) forwarding said recognizable text signal to a remote server; and 

(e) updating said language model by dynamically receiving an update from 
said remote server in accordance with said recognizable text signal. (Emphasis 
added) 



9. Method for performing speech recognition, said method comprising the 
steps of: 

(a) receiving a recognizable text signal representative of a user speech 
signal from a client device, wherein said recognizable text is generated using a 
speech recognizer having a language model and a natural language 
understanding module on said client device, and wherein said recognizable text 
is generated in accordance with adapting said performance of speech recognition 
based on at least one local parameter of said speech signal; 

(b) processing said recognizable text signal in accordance with a task 
model; and 

(c) forwarding a language model update to said client device in 
accordance with said recognizable text signal. (Emphasis added) 



15. A distributed system for performing speech recognition, said system 
comprising: 

a client device for receiving a speech signal locally from a user, said client 
device having an embedded speech recognizer with a language model and a 
natural language understanding module for performing speech recognition on 
said speech signal to produce a recognizable text signal, and wherein said 
embedded speech recognizer further adapts said performance of speech 
recognition based on at least one local parameter of said speech signal; and 

a remote server for receiving said recognizable text signal and forwarding 
a language model update to said client device in accordance with said 
recognizable text signal. (Emphasis added) 



16. A client device for performing speech recognition, said client device 
comprising: 
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means for receiving a speech signal locally from a user; 

means for performing speech recognition on said speech signal to 
produce a recognizable text signal, wherein said speech recognition means 
employs a language model and a natural language understanding module : 

means for adapting said performance of speech recognition based on at 
least one local parameter of said speech signal; and 

means for forwarding said recognizable text signal to a remote server, and 
means for updating said language model by dynamically receiving an update 
from said remote server in accordance with said recognizable text signal. 
(Emphasis added) 



22. A server for performing speech recognition, said server comprising: 

means for receiving a recognizable text signal representative of a user 
speech signal from a client device, wherein said recognizable text is generated 
using a speech recognizer having a language model and a natural language 
understanding module on said client device, and wherein said recognizable text 
is generated in accordance with adapting said performance of speech recognition 
based on at least one local parameter of said speech signal; 

means for processing said recognizable text signal in accordance with a 
task model; and 

means for forwarding a language model update to said client device in 
accordance with said recognizable text signal. (Emphasis added) 



30. A computer-readable medium having stored thereon a plurality of 
instructions, the plurality of instructions including instructions which, when 
executed by a processor, cause the processor to perform the steps comprising 
of: 

(a) receiving a speech signal locally from a user via a client device; 

(b) performing speech recognition on said speech signal in accordance 
with an embedded speech recognizer of said client device to produce a 
recognizable text signal, wherein said embedded speech recognizer employs a 
language model and a natural language understanding module ; 

(c) adapting said performance of speech recognition based on at least one 
local parameter of said speech signal; 

(d) forwarding said recognizable text signal to a remote server; and 

(e) updating said language mode! by dynamically receiving an update from 
said remote server in accordance with said recognizable text signal. (Emphasis 
added) 



31. A computer-readable medium having stored thereon a plurality of 
instructions, the plurality of instructions including instructions which, when 
executed by a processor, cause the processor to perform the steps comprising 
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(a) receiving a recognizable text signal representative of a user speech 
signal from a client device, wherein said recognizable text is generated using a 
speech recognizer having a language model and a natural language 
understanding module on said client device, and wherein said recognizable text 
is generated in accordance with adapting said performance of speech recognition 
based on at least one local parameter of said speech signal; 

(b) processing said recognizable text signal in accordance with a task 
model; and 

(c) forwarding a language model update to said client device in 
accordance with said recognizable text signal. (Emphasis added) 

Applicants' invention is directed to a method and apparatus for providing a 
dynamic speech-driven control and remote service access system, for example for use 
in connection with portable devices such as cell phones, pagers, personal digital 
assistants and the like. Many such portable devices rely at least in part on speech- 
driven user interfaces, which do not require a great deal of physical space to 
incorporate in an associated device. However, the small physical size of most portable 
devices also typically limits the processing power, and hence the robustness of a 
speech recognition system, that can be incorporated in a portable device. Thus, the 
processing demands of speech processing and recognition often exceed the processing 
capabilities of typical portable devices. 

The present invention provides a method and apparatus for improving the 
speech processing and recognition capabilities of a client device (e.g., a portable 
device) through interaction with a central server. In one embodiment, the client device 
is equipped with an initial language model for speech recognition that facilitates 
recognition of top-level user requests (e.g., general inquiries) through natural language 
understanding and/or local "speaker adaptation" (e.g., adaptation of local parameters 
such as environmental noise and speaker pronunciation). As the client device interacts 
with a user, the central server updates the client device's language model based on the 
client device's (semantic and/or pragmatic) understanding of the input speech, e.g., by 
providing more tailored language models when required for further processing of the 
user's input. This distributed approach maximizes the processing power of the client 
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device without overburdening the client device unnecessarily with a computationally 
complex language model, thus providing the client device with just enough data and 
information to perform the tasks required by the user. 

In contrast, Thrift only teaches a speech recognition grammar for remote control 
of a host that is dynamically generated based on the utterance of predefined keywords 
or phrases that are associated with specified uniform resource locators (URLsV In other 
words, the dynamically generated grammar is based on the utterance of recognizable 
keywords (e.g., valid commands based on the current display), and not on 
understanding of the content or context of the user's speech . 

The Applicants' invention positively claims the step of adapting the speech 
recognition process based natural language understanding/processing of the input 
speech signal (e.g., understanding of contextual information in the input speech signal). 
That is, the Applicants' invention is capable of extracting meaning from a recognized 
speech signal . This allows the server to update the client device's language model 
dynamically with contextual ly relevant information, as the client device interacts with a 
user providing spoken input. For example, if the user's speech signal indicates that the 
user needs to arrange for a taxi to the airport, the language model may be updated with 
additional grammars that enable the client device to query the user for pick up or drop 
off locations or times, flight times or the like. Thrift's system is completely devoid of any 
teaching relating to the need or desire to understand the context of the input speech 
signal, but merely produces additional grammar in response to the utterance of a 
predefined keyword , without any semantic or pragmatic understanding of the utterance. 

Therefore, the Applicants submit that, at least for the reasons presented above, 
independent claims 1, 9, 15, 16, 22, 30 and 31, as amended, fully satisfy the 
requirements of 35 U.S.C. §102 and are patentable thereunder. 

Dependent claims 5-7, 13-14, 20 and 26-29 depend from claims 1, 9, 15, 16 and 
22 and recite additional features therefore. As such, and for at least the same reasons 
set forth above, the Applicants submit that claims 5-7, 13-14, 20 and 26-29 are not 
anticipated by the teachings of Thrift. Therefore, the Applicants submit that dependent 
claims 5-7, 13-14, 20 and 26-29 also fully satisfy the requirements of 35 U.S.C. §102 
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and are patentable thereunder. 

II. REJECTION OF CLAIMS 2^4. 10-12, 17-19 AND 23-25 UNDER 35 U.S-C, IS 103 

The Examiner rejected claims 2-4, 10-12, 17-19 and 23-25 under 35 U.S.C. 
§1 03(a) as being unpatentable over Thrift in view the Balakrishnan et al. patent (U.S. 
Patent No. 6,182,038, issued January 30, 2001 , hereinafter Balakrishnan). In response, 
the Applicants have amended independent claims 1, 9, 15, 16 and 22, from which 
claims 2-4, 10-12, 17-19 and 23-25 depend, as discussed above to more clearly recite 
aspects of the invention. 

Thrift has been discussed above. 

Balakrishnan teaches a computer speech recognition system that operates 
independent of an application's vocabulary and language models. In particular, 
Balakrishnan teaches a method for generating context-dependent (CD) phoneme 
networks as an intermediate speech recognition step. These CD phoneme networks 
are generated from acoustic models and are specific to a user and environment. A 
user's CD phoneme network may then be provided to an application having an 
independent vocabulary and language model. Thus, final speech recognition is 
performed at the application in accordance with an application-specific language model. 
However, Balakrishnan, like Thrift, does not teach that a grammar used for speech 
recognition is dynamically updated based on natural language processing of the input 
user command (e.g., speech signal), as claimed in Applicants 1 independent claims 1, 9, 
15, 16 and 22, which have been recited above. 

As discussed above, the Applicants' invention provides a method and apparatus 
for improving the speech processing and recognition capabilities of a client device (e.g., 
a portable device) through interaction with a central server. As the client device 
interacts with a user, the central server updates the client device's initial language 
model as necessary, e.g., by providing more tailored language models (e.g., beyond the 
initial language model) when required for processing the user's input. This distributed 
approach maximizes the processing power of the client device by providing the client 
device with just enough data and information to perform the tasks required by the user. 
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In contrast, neither Thrift nor Balakrishnan teaches updating or adapting a 
speech recognition process based on natural language understanding processing of the 
speech signal being processed , as positively recited in Applicants' claims 1, 9, 15, 16 
and 22. As discussed above, this allows the server to update the client device's 
language model dynamically and with contextually relevant information, as the client 
device interacts with a user providing spoken input Thus, at least for the reasons 
presented above, independent claims 1,9, 15, 16 and 22 are not made obvious by the 
teachings of Thrift in view of Balakrishnan. 

Dependent claims 2-4, 10-12, 17-19 and 23-25 depend from claims 1, 9, 15, 16 
and 22, and recite additional features therefore. As such, and for at least the same 
reasons set forth above, the Applicants submit that claims 2-4, 10-12, 17-19 and 23-25 
are not made obvious by the teachings of Thrift in view of Balakrishnan, Therefore, the 
Applicants submit that dependent claims 2-4, 10-12, 17-19 and 23-25 also fully satisfy 
the requirements of 35 U.S.C. §103 and are patentable thereunder. 

Ill, REJECTION OF CLAIMS 8 AND 21 UNDER 35 U.S.C. § 103 

The Examiner rejected claims 8 and 21 under 35 U.S.C. §1 03(a) as being 
unpatentable over Thrift in view the Ramaswamy et aL patent (U.S. Patent No. 
6,490,560, issued December 3, 2002, hereinafter Ramaswamy). In response, the 
Applicants have amended independent claims 1 and 16, from which claims 8 and 21 
depend, as discussed above to more clearly recite aspects of the invention. 

Thrift has been discussed above. 

Ramaswamy teaches a natural language understanding system for verifying the 
identity of a speaker. In particular, Ramaswamy teaches a method for comparing an 
input behavior from a speaker speech signal with a behavior model to determine 
whether the speaker is authorized to interact with the system. Aspects of the behavior 
that are relevant for the purposes of speaker verification include how a user typically 
greets the system or what tasks the user typically asks the system to perform. In one 
embodiment, a user's behavior model may include a language model that is 
personalized for the particular user and stored as a personal cache. However, 
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Ramaswamy, like Thrift, does not teach that a grammar used for speech recognition is 
dynamically updated based on natural language processing of the input user command 
(e.g., speech signal). 

As discussed above, the Applicants' invention provides a method and apparatus 
for improving the speech processing and recognition capabilities of a client device (e.g., 
a portable device) through interaction with a central server. As the client device 
interacts with a user, the central server updates the client device's initial language 
model as necessary, e.g., by providing more tailored language models (e.g., beyond the 
initial language model) when required for processing the user's input. This distributed 
approach maximizes the processing power of the client device by providing the client 
device with just enough data and information to perform the tasks required by the user. 

In contrast, neither Thrift nor Ramaswamy teaches updating or adapting a 
speech recognition process based on natural language understanding processing of the 
speech signal being processed , as positively recited in Applicants' claims 1 and 16. As 
discussed above, this allows the server to update the client device's language model 
dynamically and with contextual ly relevant information, as the client device interacts 
with a user providing spoken input . Thus, at least for the reasons presented above, 
independent claims 1 and 16 are not made obvious by the teachings of Thrift in view of 
Ramaswamy. 

Dependent claims 8 and 21 depend, respectively, from claims 1 and 16, and 
recite additional features therefore. As such, and for at least the same reasons set forth 
above, the Applicants submit that claims 8 and 21 are not made obvious by the 
teachings of Thrift in view of Ramaswamy. Therefore, the Applicants submit that 
dependent claims 8 and 21 also fully satisfy the requirements of 35 U.S.C. §103 and are 
patentable thereunder. 

IV. NEW CLAIMS 

New claims 32, 36, 40 and 44 present original elements of independent claims 1, 
9, 16 and 22, respectively, in dependent form. New claims 33-35, 37-39, 41-43 and 45- 
47 present cancelled claims 2-4, 10-12, 17-19 and 23-25, respectively, in amended 
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form, in order to maintain proper dependency from new claims 32, 36, 40 and 44. 
V. CONCLUSION 

Thus, the Applicants submit that all of the presented claims now fully satisfy the 
requirements of 35 U.S.C. §102 and §103. Consequently, the Applicants believe that all 
these daims are presently in condition for allowance. Accordingly, both reconsideration 
of this application and its swift passage to issue are earnestly solicited. 

If, however, the Examiner believes that there are any unresolved issues requiring 
the maintenance of the present final action in any of the claims now pending in the 
application, it is requested that the Examiner telephone Mr. Kin-Wah Tona Esq. at 
(732) 530-9404 so that appropriate arrangements can be made for resolving such 
issues as expeditiously as possible. 



Respectfully submitted, 




Date 




Reg. No. 39,400 
(732) 530-9404 



Moser, Patterson & Sheridan, LLP 
595 Shrewsbury Avenue 
Shrewsbury, New Jersey 07702 
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